83 research outputs found

    A Neural Model for Generating Natural Language Summaries of Program Subroutines

    Full text link
    Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature

    Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

    Full text link
    Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and Doxygen generate documentation built around them. And yet, extracting summaries from unstructured source code repositories remains a difficult research problem -- it is very difficult to generate clean structured documentation unless the summaries are annotated by programmers. This becomes a problem in large repositories of legacy code, since it is cost prohibitive to retroactively annotate summaries in dozens or hundreds of old programs. Likewise, it is a problem for creators of automatic documentation generation algorithms, since these algorithms usually must learn from large annotated datasets, which do not exist for many programming languages. In this paper, we present a semi-automated approach via crowdsourcing and a fully-automated approach for annotating summaries from unstructured code comments. We present experiments validating the approaches, and provide recommendations and cost estimates for automatically annotating large repositories.Comment: 10 pages, plus references. Accepted for publication in the 27th IEEE International Conference on. Software Analysis, Evolution and Reengineering London, Ontario, Canada, February 18-21, 202

    DNAV: A WebGL Based Tool for Visualizing the Twists and Turns in the Human Genome

    Get PDF
    The human genome is tightly folded to fit within the restricted space of the nucleus. One of the key goals in understanding the folding principles of DNA is to unravel the mysteries of how functional elements that are separated from each other are brought together. Long-range interactions between folded segments of chromosomes form complex three-dimensional networks and are fundamental in controlling gene expression. These long-range interactions have been observed using chromosome conformation capture (3C). This Hi-C data contains a wealth of information on the nearest-neighbor influence on the deviation of the DNA axis that can be modeled theoretically. We have developed a tool using WebGL to visualize the modeled structures

    Exact expectation values of local fields in quantum sine-Gordon model

    Get PDF
    We propose an explicit expression for vacuum expectation values of the exponential fields in the sine-Gordon model. Our expression agrees both with semi-classical results in the sine-Gordon theory and with perturbative calculations in the Massive Thirring model. We use this expression to make new predictions about the large-distance asymptotic form of the two-point correlation function in the XXZ spin chain.Comment: 18 pages, harvmac.tex, 2 figure

    Who changes the string coupling ?

    Get PDF
    In general bosonic closed string backgrounds the ghost-dilaton is not the only state in the semi-relative BRST cohomology that can change the dimensionless string coupling. This fact is used to establish complete dilaton theorems in closed string field theory. The ghost-dilaton, however, is the crucial state: for backgrounds where it becomes BRST trivial we prove that the string coupling becomes an unobservable parameter of the string action. For backgrounds where the matter CFT includes free uncompactified bosons we introduce a refined BRST problem by including the zero-modes "x" of the bosons as legal operators on the complex. We argue that string field theory can be defined on this enlarged complex and that its BRST cohomology captures accurately the notion of a string background. In this complex the ghost-dilaton appears to be the only BRST-physical state changing the string coupling.Comment: 34 pages, phyzz
    • …
    corecore